Structure optimization in an off-lattice protein model 
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We study an off-lattice protein toy model with two species of monomers interacting through 
modified Lennard- Jones interactions. Low energy configurations are optimized using the pruned- 
enriched-Rosenbluth method (PERM), hitherto employed to native state searches only for off lattice 
models. For 2 dimensions we found states with lower energy than previously proposed putative 
ground states, for all chain lengths > 13. This indicates that PERM has the potential to produce 
native states also for more realistic protein models. For d = 3, where no published ground states 
exist, we present some putative lowest energy states for future comparison with other methods. 



Predicting the structure of a protein, given its sequence 
of amino acids, is one of the central problems in compu- 
tational biology. Since the problem is too difficult to be 
approached with fully realistic potentials derived from 
first principles, many authors have studied it in various 
degrees of simplifications. This involves in particular ne- 
glect of solvent water, simplifying the interactions, lump- 
ing together smaller groups of atoms, and putting every- 
thing on a discrete lattice. Among the most radically 
simplified models is the HP model of Dill and coworkers 
Q where each amino acid is treated as a point particle 
on a regular (quadratic or cubic) lattice, and only two 
types of amino acids - hydrophobic (H) and polar (P) 
- are considered. Apart from the forces responsible for 
the connectedness of the chain, the only forces are con- 
tact forces between nearest lattice neighbours which are 
different for HH, HP, and PP pairs. 



Even in this highly simplified model it is far from triv- 
ial to predict the native state for a given amino acid se- 
quence 0, IE 01 IE IS 0- The most efficient algorithms are 
either deterministic and cannot be generalized to more 
realistic models at all 0, or use sequential importance 
sampling with re-sampling in the form of the pruned- 
enriched-Rosenbluth method (PERM) [fj. Although it 
was shown that the latter can be applied also to off-lattice 
homopolymers at higher temperatures 8] , it is not obvi- 
ous that it will be efficient for off-lattice heteropolymers 
at the low temperatures needed for protein folding. 



While there are a large number of benchmark cases 
for lattice protein models in the literature, there ex- 
ist very few simple off-lattice models with known low- 
est energy states that can be used as benchmarks for 
efficient algorithms. One such model is the so-called 
AB model by Stillinger et al [S [ljj which also uses 
only two types of monomers, called now "A" (hydropho- 
bic) and "B" (polar). The distances between consecutive 
monomers along the chain are held fixed to b — 1, while 
non-consecutive monomers interact through a modified 
Lennard- Jones potential. In addition, there is an en- 
ergy contribution from each angle 9i between successive 
bonds. More precisely, the total energy for a N monomer 



chain is expressed as 

N-l N-2 N 

i=2 i=l j=i+2 



where 



EtiPi) = -(l-COB0i), 
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Here is the distance between monomers i and j (with 
i < j). Each Q is either A or B, and C(Q, Q) is +1, +i, 
and —i respectively, for AA,BB, and AB pairs, giving 
strong attraction between AA pairs, weak attraction be- 
tween BB pairs, and weak repulsion between A and B. 

This model has been studied in several papers [S 0, 
ITU IT^ . IT^ | For its 2-d version, putative ground states for 
various AB sequences and for various chain lengths are 
given in p). Ilfl IT3 . Similar models were also studied 
in 0, Il2l Il4 Il5| . but putative ground states for these 
generalizations were not given at all or for very short 
chains only. The methods used to find low energy states 
of the AB model include neural networks 0, conven- 
tional Metropolis type Monte Carlo procedures [T^j > sim- 
ulated tempering multicanonical Monte Carlo [T^ |. 
biologicall y m otivated methods 0, 0], and molecular 
dynamics 14] . In all cases the stochastic minimization 
can only lead to some state in the neighbourhood of a 
local (and hopefully also global) minimum. A greedy de- 
terministic method such as conjugate gradient descent is 
subsequently applied to reach the minimum itself. 

It is the purpose of the present paper to see whether 
PERM can be efficient for energy minimization in the 
AB model. In particular we shall use the new variant 
of PERM presented in |(|. We shall restrict ourselves 
to the subclass of "Fibonacci sequences" studied also in 
[Tof , defined recursively by 



So — A, Si — B, Si+i — Si-i * Si 



(4) 



. Here * is the concatenation operator. The first few 
sequences are = AB, S 3 = BAB, S4 = ABBAB, 
etc. They have lengths given by Ni+i = A^_i + Ni, i.e 
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N = 13 N = 21 




FIG. 1: Putative ground states of Fibonacci sequences listed 
in Table I in 2 — d space. Full dots indicate hydrophobic 
monomers. 



given by the Fibonacci numbers. Hydrophobic residues 
(A) occur isolated along the chain, while Bs occur either 
isolated or in pairs. The fraction of Bs tends to the 
golden mean 7 = 0.618 033 as the length N — > 00. 

Although PERM gives also detailed information about 
excited states and thermodynamic behaviour at tempera- 
tures T > 0, we shall not discuss this here. For studying 
the dynamics of the folding transition, in contrast, we 
would have to assume some realistic microscopic dynam- 
ics. Just like other advanced sampling methods such as 
simulated annealing or parallel tempering, PERM sacri- 
fices the realism of the dynamics for efficiency. In addi- 
tion, as in the studies mentioned above, PERM is only 
used for coming close to the native state, and conjugate 
gradient descent is then used to reach the minimum en- 
ergy state itself. 

PERM is a biased chain growth algorithm with "pop- 
ulation control" , i.e. a sequential importance sampling 
method with re-sampling Eg , implemented recursively 
in a depth first fashion |8|. While chains grow, they 
acquire weights that include both Boltzmann factors 
and bias correction ( "Rosenbluth" ^t|) factors. During 
the growth, samples with large weight are cloned, while 
chains with too small weight are pruned out. Except 
for the depth-first implementation and for the fact that 
it gives the correct Gibbs-Boltzmann statistics, PERM 
resembles therefore genetic algorithms. While the origi- 
nal version of PERM was quite successful for latti ce p ro- 
teins 0, ^| and for a host of other applications [2(| , it 
worked rather poorly for minimization of off-lattice poly- 
mer models |2l|. 

In this paper we therefore employ an improved ver- 
sion called nPERMis in 6] (for "new PERM with im- 
portance sampling"). Basically, instead of making exact 
clones of high weight chains and hoping that these clones 



will evolve differently during the subsequent growth 
(as in original PERM), we now branch such that the 
last monomers are different at the point of branching. 
Thereby we now force the two copies to be distinct, and 
we avoid the loss of diversity that also plagues genetic 
algorithms when the evolution pressure is too high. 



N = 13 




FIG. 2: Stereographic views of putative ground states of 3 — d 
Fibonacci sequences listed in Table I. Again, A monomers are 
shown as filled circles. 

For lattice polymers 6] one has for each partially grown 
chain a finite number of "candidate directions" for the 
next step. One first estimates the total weight of all these 
one-step continuations. Based on this estimate, one de- 
cides on the number of clones to be made. If, say, one 
wants to make k clones, one scans all possible fc-tuplcs 
of possible different candidate directions, and selects one 
of these tuples according to its weight. For off-lattice 
polymers one proceeds exactly in the same way, with one 
exception: The candidates are now no longer the lattice 
bonds, but one has to choose K candidate locations for 
the next monomer randomly. The number K is an im- 
portant parameter. While K « 5 was optimal for 3-d 
polymers near the 8-point we found that lowest en- 
ergies were reached in the AB model for if w 50. While 
it was necessary to make clonings with very many sib- 
lings in simulating the HP model with the old version of 
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TABLE I: Sequences and energies reached. -B pe rm is the lowest energy obtained by a PERM run, while iJ m m is the minimum 
energy obtained b y su bsequent conjugate gradient minimization. E^ in is the putative ground state energy obtained by Stillinger 
and Head-Gordon|lOl (for d = 2 only). 
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13 


ABB ABB AB ABBAB 


-3.2167 


-3.2939 


-3.2235 


-3.9730 


-4.9616 


21 


BABABBABABBABBABABBAB 


-5.7501 


-6.1976 


-5.2881 


-7.6857 


-11.5238 


34 


ABB ABBAB ABB ABBAB ABBAB 
ABB ABBAB ABBAB 


-9.2195 


-10.7001 


-8.9749 


-12.8601 


-21.5678 


55 


BABABBABABBABBABABBAB 


-14.9050 


-18.5154 


-14.4089 


-20.1070 


-32.8843 
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PERM UlEl, 

we now obtained good results by restrict- 
ing ourselves to fc-tuples with k < 3. 

Another important parameter is the temperature at 
which the simulation is run. We typically used temper- 
atures well below the collapse transition, kT w 0.1 or 
even lower. In order to speed up the ground state search, 
we also modified the Lennard-Jones potential by putting 
E2{r) — +00 for r < 1. This hard core constraint reduces 
the available phase space, but has no effect on ground 
state configurations (we did not use it in the conjugate 
gradient minimization, and we checked that it was sat- 
isfied after minimization). For chain deformation algo- 
rithms it could slow down the dynamics, since the hard 
cores could act as barriers, but it can only improve any 
pure chain growth algorithm. Finally, as a last trick, 
we used equally spaced azimuthal angles for all candi- 
dates (with one overall angle chosen at random, for each 
group of candidates), in order to make them cover the 
unit sphere more uniformly. All simulations were done 
on Linux and UNIX workstations. CPU times were up to 
2 days, but their precise values are not very significant. 
Exact timings would involve frequent comparisons of the 
minimizer basins of attraction reached by PERM, which 
we considered as too time consuming. 

In Table I we list the lowest energies thus obtained 
for the two- and three-dimensional AB model for all Fi- 
bonacci sequences with 13 < N < 55. The latter is equal 
to the length of the longest sequence studied in [lj} ■ Let 
us first discuss the case d = 2. For comparison we quote 
also the putative ground state energies from Table II of 
10]. For N < 13, our energies agree perfectly with those 
of [l(|. Except for the shortest chain with N — 13, al- 
ready PERM gave in all cases shown in Table I lower 
energies than those found in [lCj. In all these cases al- 
ready PERM by itself showed that the topologies shown 
in [l(| are not the native ones. While the subsequent 
gradient descent improved the energies substantially, it 
changed in no case the overall topology. 

The latter is true also for d = 3, although there the 
subsequent minimization gave even larger energy changes 
than in d = 2. This shows that in d = 3, too, PERM is 
able to find states very close to the native ones. Since 



there exist no published ground state energies for the 
3 — d AB model, we are unable to compare PERM with 
other methods. 

The configurations corresponding to the energies 
shown in Table I are shown in Figs. 1 (for d = 2) and 
2 (for d = 3). For d = 2 we see that none of the configu- 
rations, except the one for N = 13, have single hydropho- 
bic cores. Instead, the hydrophobic (A) monomers form 
clusters of typically 4 to 5 particles. This is easily ex- 
plained by the fact that hydrophobic monomers always 
arc flanked by polar monomers along the chain. Thus 
a clean separation into hydrophobic and polar regions 
is impossible. This shows that the AB model with Fi- 
bonacci sequences would be a very poor model for real 
proteins in d = 2. From Fig. 3 we see that the same is 
true to a lesser degree in d = 3. There the chains with 
N = 21 and N — 34 fold into configurations with single 
hydrophobic cores (except for a single A monomer which 
keeps out in both cases), and only the chain with N = 55 
forms 2 clearly disjoint main hydrophobic groups. 

In conclusion we have extended the PERM algorithm 
to an off-lattice two-species protein model. We have 
shown that it performs well, indeed we are able to re- 
fute with it previous claims for putative ground states. 

The chosen model is not very realistic. Partly this 
follows from the restriction to two types of monomers, 
partly from the fact that we did not include, as in 
|llj. more realistic local (bond angle and torsion) forces, 
and partly from the restriction to Fibonacci sequences. 
Each of this features could have been easily avoided, and 
PERM works indeed equally well if we modify any of 
them. But it was not our aim to present a realistic model. 
Rather we wanted to treat a model which is suitable for 
benchmarking, because of it is defined in a simple way 
and because it was already studied in detail before. 

It is less obvious whether PERM would also perform 
well for all-atom models with realistic potentials, or even 
with explicit solvents. Typically, its performance de- 
creases quite rapidly with the number of degrees of free- 
dom, but presumably it shares this with other modern 
methods like multicanonical sampling and parallel tem- 
pering. To answer this question, we have started such 
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simulations with the ECEPP force field implemented in Acknowledgments: We thank Walter Nadler for nu- 
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